- Metadata
- For a project predicting the collision likelihood between trucks and other users on the road, the topic of confidence interval was brought up
- The technique [[bootstrap-algorithm]] can be used to understand the level of confidence around regression coefficients and predictions
- Bootstrap algorithm to understand this
- Discussion with managers about how to implement this
- Sample for training data (~80%)
- Normalize the input data (in order to understand the coefficients in a linear regression)
- Estimate the training data and record the estimated coefficients
- Repeat n times
- Note the range of variability of the estimated coefficients, make variable selections
- Apply the model with the variability of estimated coefficient as the sample space to draw from
- Designing a statistical test
- For given scenario, we would like to test if something has caused something else to change
- Null hypothesis: no observable change
- Alternative hypothesis: observable change
- There are four properties we care about: power, significance, sample size, effect size
- These are interrelated with each other, and we can solve for any individual property if we define the other 3
- Power (1 - β)
- Is the probability to reject null hypothesis
- β is the probability of False Negative or Type II error
- Significance level (α)
- Is the probability to falsely concluding to reject null hypothesis (False Positive or Type I error)
- This is usually set at a threshold of less than 0.05, giving us the confidence level of (1−α) of greater than 95%
- Sample size (n)
- As sample size increases, the power increases even if the significance level is held constant because the variance becomes smaller
- Effect size (e)
- The separation between the means of the two distributions
- But often if the effect is small, then to increase the power one has to sample more or relax the significance level